Identification of Biomarkers Associated With Paget's Disease of Bone and Bone Metastasis From Breast Cancer Patients

ABSTRACT Background The bone is among the most frequently chosen sites for the metastatic spread of breast cancer. The prediction of biomarkers for BM (Bone Metastasis) and PDB (Paget's disease of bone) initiated from breast cancer could be critically important in categorizing individuals with a higher risk and providing targeted treatment for PDB and BM. Aims This research aims to investigate the common key candidate biomarkers that contribute to BM‐BCa (Bone metastasis of breast cancer) and PDB by employing network decomposition and functional enrichment studies. Methods and Results This research analyzed high‐throughput transcriptome sequencing (RNA‐Seq). For this work, the dataset (GSE121677) was downloaded from GEO (Gene Expression Omnibus), and DEGs were identified using Galaxy and R script 4.3. Using STRING (Search Tool for the Retrieval of Interacting Genes), high‐throughput research created a protein‐protein interaction network (PPIN). The BM‐PDB‐interactome was created using Cytoscape 3.9.1 and PDB biomarkers, with the top 3% DEGs from BM‐BCa. Functional Enrichment Analysis (Funrich 3.1.3) and DAVID 6.8 performed functional and gene set enrichment analysis (GSEA) of putatively essential biomarkers. TCGA (The Cancer Genome Atlas) validated the discovered genes. Based on our research, we identified 1262 DEGs; among these DEGs, 431 genes were upregulated, and 831 genes were downregulated. During the third growth of the interactome, 20 more genes were pinned to the BM‐PDB interactome. RAC2, PIAS1, EP300, EIF2S1, and LRP6 are among the additional 25% of genes identified to interact with the BM‐PDB interactome. To corroborate the findings of the research presented, additional functional and gene set enrichment analyses have been performed. Conclusion Of the five reported genes (RAC2, PIAS1, EP300, EIF2S1, and LRP6), RAC2 was identified to function as the common key potential biomarker in the BM‐PDB interactome analysis and validated by TCGA in the study presented.

BCa deaths will be prevented between 2020 and 2040 [2].BCa may arise in any breast area, like the ducts or lobules, and can spread to other body parts via blood and lymph vessels [3,4].Patients with BCa frequently develop bone metastasis (BM), and treating them is an essential and challenging element in the metastatic scenario.Pain, pathological fractures, and spinal cord compression are some of the clinical consequences of BM.
When cancer metastasizes to the bones, it is usually incurable, making it a devastating event in a patient's life.A significant proportion (70%) of patients with BCa experience osteolytic bone lesions, which contribute significantly to morbidity and mortality.Unfortunately, when BCa cells move to distant organs, traditional medicines (hormone therapy and chemotherapy) frequently fail to cure the illness [5].A frequent bone ailment called Paget's disease of the bone is characterized by a disordered remodeling of the bone.This condition can cause serious complications such as fissures, fractures, bone deformities, secondary osteoarthritis, deafness, and neurological problems, even though it is often asymptomatic.There is ample evidence of a remarkable decrease in PDB (Paget's disease of bone) mortality, clinical severity, prevalence, and incidence in many nations, especially in former British colonies [6,7].
A multitude of features of Paget's disease and BM in cancer patients is similar.Both are distinguished by a regional rise in osteoclast (OCL) production that results in the resorption of bone.Common mediators of increased OCL formation include interleukin-6 (IL-6) and receptor activator nuclear factor-kB ligand (RANKL) and the increased osteoclastogenic character of the bone microenvironment in PDB and BM [8].PDB, like BM, can be highly localized to a single bone or involve numerous bones.Similarly, bone scans can help both kinds of people spot lesions [9].
Biomarkers play a pivotal role in facilitating the timely identification of diseases, as well as assessing the efficacy of therapeutic interventions, among other pertinent applications [10,11].The discovery of evaluating connected interactions is becoming more important in the development of biomarkers associated with diseases that have been biologically altered through illness genesis, growth, or treatment.Understanding molecular etiology, risk assessment, illness classification, monitoring, therapeutic response, and toxicity evaluations rely on the relevance and study of biomarkers [12].Clinical bioinformatics is critical in identifying and validating diseasespecific biomarkers [13].Thus, it reveals that exploring one of the most frequent and severe bone disorders, PDB, and associating it with the common core indicators of BM from BCa may provide significant key biomarkers.Recent advancements in bioinformatics and network science analysis interpret highdimensional biological data using powerful computational methods [14,15].
The present study documents RNA-Seq data analysis to determine key biomarkers responsible for the association of BM-BCa with PDB.Finding biomarkers shared by the two disorders may significantly improve the ability to identify high-risk individuals and provide more targeted and efficient treatment plans.The study intends to address a significant gap in knowledge and establish a foundation for improved diagnostic and treatment approaches for patients with BCa-induced BM and PDB by examining these biomarkers using cutting-edge bioinformatics techniques.

| Materials and Methods
A succession of databases and tools were used to process the raw expression data and predict the outcome.The present study documents RNA-Seq data analysis to investigate the DEGs (differentially expressed genes) using the Galaxy platform and R script 4.3, and DEGs are validated by TCGA (The Cancer Genome Atlas) program.For PPI (protein-protein interaction) network analysis and to study the role and functional similarity, STRING and Cytoscape 3.9.1 were used.Further, it was investigated for GO (gene ontology) analysis and GSEA (gene set enrichment analysis) using DAVID 6.8 and FunRich 3.1.3to identify the biological pathways of the significant genes and to determine key biomarkers responsible for the association of BM-BCa with PDB.Gene expression profiles from the TCGA were used to validate the reported genes.Figure 1 depicts the schematic flow of the presented research.

| Data Retrieval
The NCBI GEO database (https://www.ncbi.nlm.nih.gov/geo/) was searched with the phrase "Bone metastasis AND breast cancer" that filters unique to an organism, that is, Homo sapiens.The GEO dataset GSE121677 contains information from six samples, three of which are BCa samples and three of which are BM-BCa samples.The collection was formed from spontaneous metastasis from the mammary fat pad to the bone marrow in the absence of exogenous estrogen.High throughput sequencing for gene expression profiling of GSE121677's RNA expression profile was determined using the GPL18573 Illumina HiSeq 500 (H.sapiens).The dataset GSE121677 is a vital source for this research project.DEGs linked to bone metastases of breast cancer (BM-BCa) can be found in this dataset thanks to high-throughput transcriptome sequencing (RNA-Seq) data.RNA-Seq data from primary BCa and BM were obtained from the SRA database (SRA study PRJNA498077) [16].The parameters for the dataset GSE121677 retrieved from GEO are listed in Table 1.

| Data Normalization and Processing
For GSE121677, raw FASTQ files were retrieved and analyzed using the Galaxy web interface platform (https://usega laxy.org/) to acquire raw counts, as the submitted input was  [17,18].The raw read's quality was assessed using FASTQC, and a FASTQC HTML report was generated [19].Second, Trimmomatic was then used to delete the adapter sequences [20].The adapter-trimmed reads were aligned to a reference genome with the sequence aligner HISAT2, with all other options set to default [21].Following alignment, the resulting BAM files were used to create raw counts for each RNA-Seq data sample using the featureCounts tool (Data S1) [22].

| Differential Expression Analysis
The raw counts of each of the SRA sample files (SRR8097383, SRR8097384, SRR8097385, SRR8097386, SRR8097387, SRR8097388, SRR8097389, SRR8097390, SRR8097391, SRR8097392, SRR8097393, SRR8097394, SRR8097395, SRR8097396, SRR8097397, SRR8097398, SRR8097399, SRR8097400, SRR8097401, SRR8097402, SRR8097403, SRR8097404, SRR8097405, and SRR8097406) of the GEO dataset GSE121677 were processed independently as a counts data file and col data file for quality control and differential gene expression analysis was performed by R script using the DESeq2 tool as a computational approach.R version 4.3 was used to annotate raw count data by mapping Entrez Gene IDs and Gene symbols.Gene symbols with no Entrez ID, no counts data, or duplicates were removed as the filter criteria.DEGs were screened based on significant p-value < 0.05 and |log 2 FC| > ±1.5 [23].

TNFRSF11A
Participates in the function of osteoclast mediation and analyzes metabolism of bone [25] TNFSF11 Acts as a crucial component in the process of differentiation and osteoclast activation, regulation of T cell-dependent immune response, and stimulates antiapoptotic kinase TNFRSF11B Mediator for lymph node development, osteoclast differentiation, and activity [28,29] [ 26,27] SPP1 Involves in the binding of a calcified matrix of bone to osteoclasts.The protein that has been encoded is released and exhibits a strong affinity for hydroxyapatite [30] SOST Loss-of-function mutations in this gene cause progressive bone overgrowth [31,32] DKK-1 Holds a crucial function in embryonic development and adult bone formation [30,32] SFRP-1 Acts as soluble modulators of Wnt signaling, epigenetic suppression of SFRP genes leads to dysregulated activation of the Wnt pathway, which is linked to cancer [30] CALCA Involves in calcium regulation, phosphorus metabolism, vasodilator, and acts as an antimicrobial peptide [33,34] CEACAM7 Activity of this gene may be reduced in colon and rectal cancers [35,36] CYFRA Used as a diagnostic biomarker for gastrointestinal cancers, regulates differentiation and polarity of the cell [35,37] CXCR4 It collaborates with the CD4 receptor to aid HIV entrance into the cells and is also abundant in the cancer cells [39] ACKR3 Acts as a receptor for vasoactive intestinal peptide (VIP), coreceptor for human immunodeficiency viruses (HIV) [39,40] IBSP Interacts with calcium and hydroxyapatite through clusters of acidic amino acid and enables attachment of cells via an RGD motif that identifies the vitronectin receptor [41] SIL1 Used as a nucleotide exchanging factor for a response from the other unfolded protein [30,42] POSTN Involves in tissue formation and regrowth, as well as cancer stem cell survival and metastasis [43] APD Acts as regulators of osteoclast development, lymphnode organogenesis, and vascular calcification [44,45]

| Validation by TCGA
TCGA, a project launched by the NCI (National Cancer Institute) in 2005, aims to uncover genetic alterations linked to cancer through comprehensive genome-wide sequencing and advanced analytics.Over 20 000 primary cancer and matched normal samples covering 33 cancer types have been molecularly characterized by TCGA.The TCGA database comprises plenty of specimens of tissue that are analyzed through a variety of methods, including genomic expression.Furthermore, the data are publicly accessible to all researchers for their particular investigations.Therefore, the present study utilized the extensive TCGA RNA-Seq dataset.CMI-MBC (Count Me In: The Metastatic Breast Cancer) project was retrieved to gather data on BCa metastasis [24].The retrieved RNA-Seq data contain 200 BCa metastatic samples, including 168 primary tumors and 32 metastasis samples.

| Construction of the PPI Network
Using a web-based database called STRING, a PPIN of DEGs was created, which is intended for assessing PPI interactome analysis [46].Using Cytoscape 3.9.1, an interactome of PDB biomarkers and the top 3% of BM DEGs (the BM-PDB interactome) was created to scrutinize the PPI network interactive analysis.
In addition to Cytoscape, Clusterviz plugin was installed to find multiple clusters in a network.Clusterviz fascinates on the extensively utilized algorithms, namely, MCODE, FAG-EC, and EAGLE.MCODE's algorithm was used in the present study because it finds extremely connected fragments in an area of the network to build biological networks using correlation analysis, which has three processes: vertex weighting, postprocessing, and complex prediction in filtering or adding proteins in the generated complex.A web composed of interconnected molecules be visualized graphically, with nodes and edges, respectively.MCODE employs strategies based on the coefficient of clustering to determine a vertex's neighborhood to locate locally dense sections in a network.A clique is defined as the most linked network using all the MCODE options to default [47].Here, Tables 3 and 4 demonstrate a list of the top 40 up and downregulated DEGs.

| Integrated Enrichment Analysis of Candidate Key Biomarkers
A comprehensive study involving functional and gene set pathway enrichment analysis was conducted in the present study.The identified DEGs were considered for labeling functional annotations and enrichment analysis of the candidate genes by GO analysis [48].DAVID 6.8, a functional enrichment tool, was used to perform enrichment analysis [49].The study concentrated primarily on functional annotations of GO, including the ontologies of cellular components (CCs), biological processes (BP), and molecular function (MF).GO terms with (gene count ≥2) were declared statistically significant [50,51].The identified DEGs were matched to protein databases and were performed Functional Enrichment Analysis (Funrich 3.1.3)[52].Moreover, on the other hand, the GSEA was also done to identify significant biological pathways shown by RNA-Seq [51,52].

| Identification and Screening of DEGs
The 24 SRA samples were obtained from the NCBI GEO Dataset GSE121677, which included approximately 12 primary BCa samples and the remaining 12 bone metastatic BCa  samples processed through the platform.FASTQC analyzed the raw data quality, and an HTML report was generated.The adapter sequences were subsequently deleted using Trimmomatic.The adapter-trimmed reads were aligned to the HISAT2 sequence aligner.The generated BAM files were then used to create raw counts for each RNA-Seq data sample using the tool named featureCounts.By analyzing the RNA-Seq read counts of the various genes with R script 4.3 using the computational approach and then DEGs were screened with the significant cut-off criteria: p-value < 0.05 and |log 2 FC| > ±1.5, and we identified 1262 genes as DEGs from the GSE121677 dataset, including 431 upregulated and 831 downregulated genes.The total 1262 genes based on the significant cut-off criteria are provided in Data S2, and the MA (moving average) plot is shown in Figure 2.

| TCGA Validation
The TCGA gene expression databases were utilized for determining the information about gene expression in the study presented.The retrieval of RNA-Seq data, which contains 200 BCa metastatic samples, including 168 primary tumor samples and 32 metastasis samples, has been considered to yield 2550 DEGs according to the specified criteria p-value < 0.05 and |log 2 FC| > ±1.5.The results have been linked to the Human GRCh38/hg38 chromosomal standardized reference.Of the top 40 DEGs chosen, 47% are considered to be highly expressed by TCGA.

| Network Analysis: BM-PDB Interactome
The PPIN of DEGs, containing BM-PDB interactome network of which 45 and 42 are nodes and edges, respectively, of the analyzed RNA-Seq data was submitted to STRING (Figure 3).Due to 23 seed genes exhibiting direct interactions, gene interactions were noted using Cytoscape 3.9.1 until the third growth.As of the third growth, the enlarged PPI network has 65 nodes and 121 edges, in which Cytoscape 3.9.1 was used to differentiate between the proposed 23 seed genes (DEGs and PDB biomarkers) and the 20 additional genes introduced up to the third growth (Figure 4).For further GO analysis, these five (RAC2, PIAS1, EP300, EIF2S1, and LRP6) of the 20 additional genes were chosen as these genes were observed to be highly connected with DEGs of BM-BCa and PDB in BM-PDB interactome.The clusters (A, B, C, and D) formed were based on the plugin ClusterViz of the candidate key biomarkers of the BM-PDB interactome (Figure 5).

| Integrated Enrichment Analysis of Candidate Key Biomarkers
The integrated enrichment analysis, which includes functional and biological pathway enrichment analyses, was carried out.DAVID 6.8 and FunRich 3.1.3were used to perform functional enrichment analysis, including GSEA, to obtain further insight into the function of identified genes.Separate enrichment studies of the five genes were carried out (Data S3 and S4).

| Functional Enrichment Analysis
The gene abundance was primarily focused on the GO analysis and the biological terms (MF, BP, and CC).In CC, the top percentage of genes were enriched in the cytoplasm, that is, 100% (Figure 6A).In MF, the top percentage of genes were enriched in transcription regulator activity, which is 40% (Figure 6B).In contrast, in BP, the top percentage of genes were enriched in cell communication and signal transduction, which is 40% (Figure 6C).

| Biological Pathway Enrichment Analysis
The biological pathway enrichment analysis was carried out in order to identify the biological pathways of the five key RNA-Seq candidate vital biomarkers.Table 5 describes the details of the GO and biological pathway functions of five distinct key candidate genes.

| Discussion
This present research reports an extensive study into the efficacy of interactome investigations to predict clinical outcomes for early prognosis, therapy, and therapeutic response to a specific disease utilizing candidate key biomarkers [59].Network science provides a framework for studying the biological and molecular mechanisms underlying human diseases; here, network studies are conducted for interactive analysis in BM-BCa with the PDB.RNA-Seq technology has recently been widely employed to uncover molecular targets for a particular disease [59,60].The utilization of identical key biomarkers for PDB in the presented work plays a significant role in BM-PDB interactome analysis [60].
The five potential biomarkers that have been identified were subjected to a comprehensive literature survey in order to elucidate their association with PDB and BM.According to the analysis, RAC2, the reported gene that encodes a small guanosine triphosphate (GTP)-metabolizing protein from the Ras superfamily, is involved in signal transduction in a variety of cellular activities such as chemotaxis, cytoskeletal rearrangement, cellular differentiation, and proliferation.It was also found to have a unique genetic relationship to Crohn's disease [61].The gene RAC2 with node score of 2.0 was observed to be in first cluster in ClusterViz results where RAC2 shows direct interactions with S100A9 (downregulated DEG with fold change: −6.61), S100A8 (downregulated DEG with fold change: −7.43), ARHGDIB (upregulated DEG with fold change: 3.72), and CXCR4 (PDB biomarker), all these four genes were reported as DEGs in TCGA validation.All these genes interacting with RAC2 have been observed in a common pathway, that is, Wnt signaling pathway and pathways of cancer, when performed GO.
PIAS1, the second reported gene, encodes a protein inhibitor of activated STAT (PIAS) members.By directing the sumoylation of target proteins, PIAS1 proteins act as SUMO E3 ligases and play crucial roles in numerous physiological processes [62].EP300, another reported gene, encodes the transcriptional coactivator p300 protein associated with adenovirus E1A [63].Both genes PIAS1 and EP300 were present in cluster 2, constructed by ClusterViz, with node scores of 3.0 and 2.7, respectively.Gene PIAS1 and EP300 show direct interactions with two added genes (IRF1 and CREBBP) and one downregulated gene, STAT1, with fold change: −7.28.Also, it is reported that all the genes of cluster 2 are involved in the JAK-STAT signaling pathway in KEGG.
EIF2S1, another reported gene, promotes the first controlled stage in the start of protein synthesis, encouraging initiator tRNA binding to subunits of 40S ribosomes.This gene has also been linked to neurodegenerative disorders [64].The gene EIF2S1 with node score of 4.0 was observed to be in the third cluster in ClusterViz results where EIF2S1 shows direct interactions with EIF1 (upregulated DEG with fold change: 5.95), three added genes EIF5, EIF2S3, and EIF3J.All these genes observed in cluster 3 have been reported in a common KEGG pathway, that is, RNA transport.
LRP6 is the fifth reported gene that encodes for low-density lipoprotein (LDL) receptors.This gene has also been associated with coronary artery disease [65].The gene LRP6, with a node score of 5.0, was observed to be in the fourth cluster in ClusterViz results, where LRP6 shows direct interactions with PDB biomarkers and one added gene, whereas it shows indirect interactions with DEGs of BM.All these genes interacting with the LRP6 have been observed in the Wnt signaling pathway when performed GO.
As demonstrated in our research, all the reported genes show direct or indirect interactions with BM-PDB.Of the five reported genes, the first cluster of the RAC2 gene plays a distinct role in the BM-PDB interactome analysis as it is the only gene that shows direct interaction with the DEGs and PDB biomarkers and is also validated by TCGA.The key benefit of the present study is that it makes use of the BM-PDB interactome to build and study the network clusters of five distinct key biomarkers.The interactions between PPI networks are derived in this work for the investigation of one of the most prevalent and severe bone illnesses, PDB, and linking them with the usual core signs of BM-BCa may give substantial biomarkers [66].The biomarkers revealed in the study will provide a dynamic and powerful method for comprehending the BM-PDB illness spectrum, with applications in randomized clinical trials, screening, diagnosis, and prognosis [67].Biomarkers can also represent the complete disease spectrum, from early signs to late stages.Our findings might be used in future studies to establish gene expression-based prediction models that may also benefit from their use in clinical practice [68].

| Conclusion
The NCBI GEO Dataset GSE121677, which depicts BM-BCa, was used in this research to identify common potential biomarkers for PDB and BM-BCa.The current study detected 431 and 831 up and downregulated DEGs from the retrieved dataset.Out of them, the top 3% DEGs with 16 significant PDB biomarkers based on literature mining were then submitted to STRING and Cytoscape 3.9.1 to be further studied for PPI interactome analysis, which identified the common five distinct candidate key biomarkers until the third growth due to the main seed genes showing direct interactions.These five (RAC2, PIAS1, EP300, EIF2S1, and LRP6) reported genes were chosen for further GO and GSEA analysis and were most abundant for functional enrichment analysis in "cytoplasm, transcription regulator activity, cell communication, and signal transduction" whereas, in pathway enrichment analysis, they were prevalent in "Wnt signaling pathway, pathways in cancer, adherens junction, Hepatitis C, JAK-STAT signaling pathway, Influenza A." All five reported genes were utilized in ClusterViz, which illustrates the clusters of the five distinct putative key biomarkers along with their first neighbors.Of these, the RAC2 gene acts as a key candidate biomarker that shows direct interaction with the DEGs and PDB biomarkers and is also validated by TCGA.Thus, this finalized reported gene might lead to new insights into molecular disease pathways, paving the way for the creation of reasonable personalized treatment approaches to develop gene expression prediction and prognosis models of a particular disease.It is necessary to validate the presented biomarkers through in vitro and in vivo research, which can potentially be conducted in subsequent studies.

FIGURE 1 |
FIGURE 1 | Schematic representation of the present study.

FIGURE 3 |
FIGURE 3 | BM-PDB interactome network (red color nodes: DEGs and blue color nodes: PDB biomarkers) constructed by STRING to observe the interactions between PDB and BM-related DEGs.

FIGURE 4 |
FIGURE 4| Network extension of the BM-PDB interactome with the added genes (red color nodes: up and downregulated DEGs, blue color nodes: PDB biomarkers, and green color nodes: added genes).

FIGURE 5 |
FIGURE 5 | Four clusters (A, B, C, and D) developed by ClusterViz for the distinct BM-PDB interactome.Here, red color nodes are DEGs, blue color nodes are PDB biomarkers, and green color nodes are added genes.

FIGURE 6 |
FIGURE 6 | Three pie charts (A, B, and C) developed by FunRich 3.1.3for the gene ontology analysis.Here, (A) cellular component for gene ontology, (B) molecular function for gene ontology, and (C) biological process for gene ontology.

TABLE 2 |
List of known biomarkers for the PDB with reported functions.

TABLE 3 |
List of the top 20 upregulated genes.

TABLE 4 |
List of top 20 downregulated genes.